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A CLASS OF STATISTICAL MODELS TO WEAKEN 
INDEPENDENCE IN TWO-WAY CONTINGENCY TABLES 

By Enrico Carlini and Fabio Rapallo 
Politecnico di Torino and University of Eastern Piedmont 

In this paper we study a new class of statistical models for con- 
tingency tables. We define this class of models through a subset of 
the binomial equations of the classical independence model. We use 
some notions from Algebraic Statistics to compute their sufficient 
statistic, and to prove that they are log-linear. Moreover, we show 
how to compute maximum likelihood estimates and to perform exact 
inference through the Diaconis-Sturmfels algorithm. Examples show 
that these models can be useful in a wide range of applications. 

1. Introduction. One of the most popular statistical models for two- 
way contingency tables is the independence model. It has became a reference 
tool in applied research where categorical variables are concerned. In many 
applications the independence model is sufficient to describe and model the 
data, but this is not always the case. There are situations where the inde- 
pendence model does not fit the data and one has to detect more complex 
relations between the random variables. Thus, different models have been 
introduced in order to identify some structures in the contingency tables. 
Most of these models belong to the class of log-linear models. Among these, 
we recall the quasi-independence model, the quasi-symmetry model, the lo- 
gistic regression model. As a general reference for these models see again [3] . 
Such models have a wide spectrum of applications in, e.g., biology, psychol- 
ogy and medicine. The books by Fienberg [15], Fingleton [17], Le [26] and 
Agresti [3] present a great deal of examples with real data sets coming from 
the most disparate disciplines. 

A recent development in the area of statistical models for contingency 
tables involves the use of some tools from Algebraic Geometry to describe 
the structure and the properties of the models. This field is currently known 
under the name of Algebraic Statistics. While the first work on this direc- 
tion relates to a method for exact inference, see [13], following papers have 
focused their attention on the geometry of the statistical models through 
polynomial algebra. The algebraic and geometric point of view in the analy- 
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sis of probability models allows us to generalize statistical models in presence 
of cells with zero probability (toric models), to study its exponential struc- 
ture, and to make inference feasible also in models with complex structure. 
This approach has been particularly useful in the fields of log-linear and 
graphical models. Some relevant works on these recent topics are [20], [18], 
[19], and [31]. An exposition of such theory, with a view toward applications 
to computational biology, can be found in [27]. 

The theoretical advances mentioned above also have a computational 
counterpart. In fact, many symbolic softwares traditionally conceived for 
polynomial algebra now include special functions or packages specifically 
designed for Algebraic Statistics, see e.g. CoCoA [0], 4ti2 [1], and LattE 
[12]. 

In this paper we consider statistical models for two-way contingency ta- 
bles with strictly positive cell probabilities. We introduce a class of models 
in order to weaken independence, starting from the binomial representation 
of the independence model. The independence statement means that the 
table of probabilities has rank 1, and therefore that all 2 x 2 minors vanish. 
In the strictly positive case, this is equivalent to the vanishing of all 2 x 2 
adjacent minors. Our models, which we call weakened independence models, 
are defined through a subset of the independence binomial equations. As 
a consequence, the independence statements hold locally and the resulting 
models allow us to identify local patterns of independence in contingency 
tables. We study the main properties of such models. In particular, we prove 
that they belong to the class of log-linear models, and we determine their 
sufficient statistic. Moreover, we compute the corresponding Markov bases, 
in order to apply the Diaconis-Sturmfels algorithm without symbolic com- 
putations. The relevance of our theory is emphasized by some examples on 
real data sets. We also show that our models have connections with a prob- 
lem recently stated by Bernd Sturmfels in the field of probability models for 
Computational Biology, the so-called "100 Swiss Francs Problem", see [32]. 

While most of the papers in Algebraic Statistics uses algebraic and ge- 
ometric methods to describe and analyze existing statistical models, or to 
make exact inference, the main focus of this paper is the definition of a new 
class of models, by exploiting the Algebraic Statistics way of thinking. 

Notice that we restrict the analysis to adjacent minors. Therefore, the ap- 
plications are mainly concerned with binary or ordinal random variables. At 
the end of the paper we will give some pointers to follow-ups and extensions 
of this work. 

In Section 2 we define the weakened independence models and we give 
some examples, while in Section 3 we provide the computation of a suffi- 
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cient statistic. In Section 4 we prove that these models belong to the class of 
toric models (and therefore they are log-linear for strictly positive probabil- 
ities), and we explicitly write down some consequences, such as a canonical 
parametrization of the models. In Section 5 we compute the Markov bases 
for weakened independence models and we present some examples with real 
data. In particular, Example 5.3 is devoted to the discussion of some interest- 
ing relationships between our models and the "100 Swiss Francs Problem" . 
Section 6 highlights the main contributions of our theory and provides some 
pointers to future developments. 

2. Definitions. A two-way contingency table collects data from a sam- 
ple where two categorical variables, say X and Y, are measured. Suppose 
that X has I levels and Y has J levels. The sample space for a sample of size 
one is X = {1, . . . , 1} x {1, . . . , J} and a joint probability distribution for 
an I x J contingency table is a table of raw probabilities (pi j)i=i i j=i ... j 
in the simplex 



A statistical model for an / x J contingency table is then a subset of A de- 
fined through equations on the raw probabilities pi.i, . . . ,Pi,j- In this paper, 
we do not allow any pij to be zero, and we assume strict positivity of all 
probabilities. 

The independence model can be defined in parametric form through the 
power product representation, i.e. by the set of equations 



for i = 1, . . . , I and j = 1, . . . , J, where (f and Q are unrestricted positive 
parameters and Co is the normalizing constant, see [28]. In term of log- 
probabilities, Eq. (2.1) assumes the most familiar form 



where A = log Co, A^ = logCi for i = 1, ... ,1 and Aj = logQ for j = 
1, . . . J. As an equivalent representation, one can derive implicit formulae 
on the raw probabilities pij. Eliminating the £ variables from Eq. (2.1), one 
obtains the set of equations below: 




(2.1) 



(2.2) 



log ft j = A + Af +Aj 



Y 



(2.3) 



Pi,jPk,m Pi,mPk,j — 
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for all 1 < i < k < I and 1 < j < m < J. In other words, in the independence 
model all 2 x 2 minors of the table vanish. It is well known, see e.g. [3], that 
in the positive case, the equalities in Eq. (2.3) are redundant and it is enough 
to set to zero the adjacent minors: 

(2-4) I'i.jl'i- :.j ■ i - Pi+i,jPi,j+i 

for all 1 < i < I and 1 < j < J. 

Remark 2.1. In the framework of toric models as defined in [28], where 
structural zeros are allowed, the implicit representations (2.3) and (2.4) are 
not equivalent, as they differ on the boundary. For a description of such 
phenomenon, see [31]. 

In algebraic terms, let 

C = {PhjPt+ij+i ~ Pi+i,jPi,j+i : 1 < * < I, 1 <j < J} ■ 

The set C is the set of all 2 x 2 adjacent minors of the table of probabilities. 
Moreover, let K[p] be the polynomial ring in / x J indeterminates with real 
coefficients. 

From the geometric point of view, the independence model is the variety 

Ve = {p i:j : C = o}nA, 

i.e., the set of the points of the simplex where all binomials in C vanish. 

The choice of a subset of C leads us to the definition of a new class of 
models. 

Definition 2.2. Let B be a subset of C. The ^-weakened independence 
model is the variety 

V B = {pi,j : £ = 0}nA. 

Of course, Vq C Vq for all subsets B of C. The meaning of the class of 
models in Definition 2.2 is quite simple. In fact, the choice of a given set of 
minors means that we allow the binomial independence statements to hold 
locally, i.e., we determine patterns of independence. 

Example 2.3. As a first applications, we consider a 2 x J contingency 
table. A table of this kind could derive, e.g., from the observation of a binary 
random variable X at different times. 

The model defined through the set of binomials 



6 = {Pl,lP2,2 ~ Pl,2P2,l,Pl,2P2,3 ~ Pl,3P2,2, ■ ■ ■ ,Pl,j'-lP2,j' ~ Pi ,j'P2,j'-l } , 
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Fig 1. Binomials for a change-point problem in logistic regression. 



where f < J, is presented in Figure 1. This choice of B means that there is 
independence between X and the time up to the instant j' and not after. In 
literature, the point j' in this model refers to the detection of the change- 
point in a logistic regression model. A recent paper about this topic is [21]. 

Example 2.4. Let us consider a I x I contingency table. A table of this 
kind could derive from a rater agreement analysis. Suppose that 2 raters 
independently classify n objects using a nominal or ordinal scale with / 
categories. If we set 

& = {Pl,lP2,2 -pi,2P2,l} 

the corresponding model yields that categories 1 and 2 are indistinguishable. 
A reference for the notion of category indistinguishability is, e.g., [11]. This 
model can be generalized using the set of binomials 

B = {Pi,jPi+i,j+i ~ Pi,j+m+i,j ■ 1 < i < i', 1 < j < i'} 

meaning that the categories 1, . . . ,i' are indistinguishable. The first paper 
in the direction of modelling patterns of agreement is [2] . An example with 5 
categories and 3 undistinguishable categories is presented in Figure 2. More 
examples on the models for rater agreement problems will be presented later 
in the paper. 

Remark 2.5. In the next sections, our approach will proceed somehow 
backwards with respect to the classical log- linear models theory. In fact, we 
will define the model through the binomials and then we will use them to 
determine a sufficient statistic and a parametrization. 

3. Sufficient statistic. As noticed in the Introduction, the indepen- 
dence model is defined through the log-linear form in Eq. (2.2). One can 
easily check that for the independence model a sufficient statistic T for the 
sample of size 1 is given by the indicator functions of the I rows and the 
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Fig 2. Binomials for Example 2.4- 

indicator functions of the J columns. More precisely, we denote the indica- 
tor function of the i-th. row by and the indicator function of the j-th. 
column by Writing the sample space as X = {1, . . . , 1} x {1, . . . , J}, 
a sufficient statistic for the independence model is 

T= (l(i, + ),...,I(7 !+ ),I( +) i),...,I( +) j)J . 

A single observation is an element of the sample space X and its table has a 
single count of 1 in one cell and otherwise. This observation yields a value 
of 1 in the corresponding row and column indicator functions in T. 

Therefore, the sufficient statistic T for a sample of size 1 is a linear map 
from X to N /+J . The function T can be extended to a linear homomorphism 
T : R IJ — > R I+J . 

In Section 4 we will prove that weakened independence models, as the 
independence model, are log-linear. Thus, the sufficient statistic for a sample 
of size n is the sum of the sufficient statistics of all components of the sample 
and it will be formed by the sum of appropriate cell counts, as familiar in 
the field of categorical data analysis, see e.g. [3]. However, in this section it 
is more convenient to work with a sample of size one and with the indicator 
functions. This approach has been fruitfully used in [22] and, more recently, 
in [28]. 

Hereinafter, we write the table as a column vector, i.e. the table of prob- 
abilities is written as 

P = (Pi,i, ■ ■ ■ ,Pi,J, ■ ■ ■ ,Pi,i, ■ ■ ■ ,Pl,jf i 

where t denotes the transposition. Moreover, we use a vector notation, i.e. 
we write a binomial in the form p a —p b , meaning p°^^ ■ ■ ■ p^f —p^i ■ • • p b /j ■ 
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We briefly review the relationship between the sufficient statistic and the 
binomials in Eq. (2.4). Writing the table as a column vector of length I J, the 
matrix representation of T is a matrix Ac ■ This matrix has size I J x ( I + J) 
and its rank is / + J — 1. 

Moreover, consider the log-vector of a 2 x 2 minor to be defined in the 
following way: 

A : R[p] — > R IJ 
p a — p b i — > a — b 

We denote by Zc the sub-vector space of W IJ generated by the vectors 
A(m), for all 2 x 2 adjacent minors m. It is well known, see for example 
[6], that Zc has dimension (I — 1)(J — 1) and the sequence of log- vectors 
A(m) with m 6 C is a sequence of (/ — 1)(J — 1) linearly independent 
vectors orthogonal to Ac- Hence the column space Ac is the orthogonal of 
Zc Thus, from a vector-space perspective, the exponents of the binomials 
are the orthogonal complement of the matrix Ac- In the sequel, we will use 
the same symbol to denote a matrix A and the sub-vector space of W IJ 
generated by the columns of A, although this should be considered as a 
slight abuse of notation. 

The procedure described above is quite general and it provides a method 
to actually compute the relevant binomials of a statistical model with a 
given sufficient statistic. For more details, see [28]. 

In order to analyze the weakened independence models in Definition 2.2, 
we use the theory sketched above for the independence model. We start 
with a set of binomials, we compute a sufficient statistic and the parametric 
representation of the model. 

Remark 3.1. We will prove in Section 4 that weakened independence 
models are log-linear. Therefore, the orthogonal to the log-vectors of the 
chosen binomials is the matrix representation of a sufficient statistic. In order 
to keep notation as simple as possible, we call this orthogonal a sufficient 
statistic even before showing that the models are log-linear. 

Lemma 3.2. The log-vectors of d distinct adjacent minors are linearly 
independent. 

Proof. Let Bd be a set of d distinct adjacent minors and let Ld be the set 
of their log-vectors. We proceed by induction on d. For d = 1 the statement 
is clearly true. We assume that the elements of Lj are linearly independent 
for all i < d and we will show that the same holds for Ld- Let m 6 Bd be the 
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minor involving the indeterminate having the lex-smallest index, say pjj, 
and notice that no other element in B^ involves pjj. Let I = A(m) £ and 
notice that I is not a linear combination of the element of \ {/} which 
are linearly independent by hypothesis. Hence the element of are linearly 
independent. □ 

Remark 3.3. Lemma 3.2 is false when we consider log- vectors of non- 
adjacent minors. As a counterexample, take a 2 x 3 table and all three 
minors. 

Now, consider a ^-weakened independence model with set of adjacent 
minors B of cardinality m. Let Zb be the matrix of the log- vectors of the 
adjacent minors in B. In view of Lemma 3.2, the orthogonal of Zg has 
dimension (77 — m). Thus, the explicit computation of As, the orthogonal 
of Zq, requires to find at least (7 J — m) vectors orthogonal to Zg. Although 
this can be done simply with a linear algebra algorithm, it is very useful to 
investigate the structure of the the matrix Aq. 

Given a ^-weakened independence model for 7 x J contingency tables, we 
define a graph in the following way. 

Definition 3.4. Given a set B of adjacent minors, we define a graph 
Gb as follows: the set of vertices is the set of cells and each binomial defines 
4 edges. The binomial pijpi+ij+i — pjj+ipj+ij defines the edges (i,j) <-» 
(i+l,j), (i+lj+l), <-> (t+l,j'+l) and{i,j) (ij+l). 

The edges associated to a binomial are the 4 sides of the square with 
vertices on the 4 cells involved in the binomial. 

Definition 3.5. A cell (i,j) is a free cell if no edge of Gb involves (i,j)- 

Equivalently, a cell is a free cell if and only if the indeterminate pij 
does not appear in any of the binomials in B. 

Definition 3.6. The sequence of cells + + h) is 

a connected component of the i-th row if each pair of consecutive cells is 
connected by an edge of Gb ■ The sequence forms a maximal connected row 
component (MCR) if the sequence is no more connected when one adds 
- 1) or + h + l). 

One can define similarly the maximal connected column component (MCC). 
We illustrate the definitions above with an example. 
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Fig 3. Binomials for Example 3.7. 

Example 3.7. In the model for a 4 x 4 contingency table defined through 
the binomials in Figure 3, we have 4 MCRs, 5 MCCs and 2 free cells. 

Proposition 3.8. Consider a B-weakened independence model with set 
of binomials B and let Zq be the matrix of the log-vectors of the minors in 
B. The indicator vectors of the free cells, the indicator vectors of the MCRs 
and the indicator vectors of the MCCs are orthogonal to the column space 

Proof. If is a free cell, then no monomial in B involves the corre- 
sponding variable. Hence the indicator vector of (i,j) is orthogonal to the 
column space of Z&. Given a MCR, its indicator function is clearly orthog- 
onal to the columns of Zg corresponding to minors not involving the cells 
of the MCR. If a minor involves a cell of the MCR, then it involves two 
cells with alternating signs. A similar argument works for MCCs. Hence the 
orthogonality follows. □ 

Now, two questions arise: one about the linear independence of the vectors 
defined in Proposition 3.8 and the other about the dimension of the sub- 
vector space generated by such vectors. In other words, we have to investigate 
whether these vectors generate the space orthogonal to Z& or not. Let us 
start with two simple examples. 

Example 3.9. Consider a weakened independence model for 4x4 tables 
defined through the adjacent minors in Figure 4. In this situation, Z& has 
rank 3 and there are 4 MCRs, 4 MCCs and 6 free cells. Here, the 14 vectors 
corresponding to the MCRs, to the MCCs and to the free cells generate 
a sub-vector space of dimension 13. Thus, they are enough to define the 
matrix Aq. 
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Fig 4. Binomials for Example 3.9. 



Fig 5. Binomials for Example 3.10. 



Example 3.10. Consider now a weakened independence model for 4x4 
tables denned through the adjacent minors in Figure 5. The model above 
only leaves out one minor, namely the central one. In this case, Zg has rank 
8 and there are 4 MCRs and 4 MCCs. The 8 vectors corresponding to the 
MCRs and to the MCCs generate a sub- vector space of dimension 7 and 
therefore they are not enough to generate the orthogonal space As- 

The dimension of the vector space generated by the indicator function of 
the MCRs, the MCCs and the free cells can be computed and we have the 
following results. 

Proposition 3.11. For any connected component of B with r MCRs 
and c MCCs, the vector space generated by the MCRs and by the MCCs 
has dimension (r + c — 1). 

Proof. Clearly the log- vectors of the MCRs and of the MCCs are not 
linearly independent as their sums are equal. To show that this is the only 
relation we proceed by induction on the number of minors in the connected 



MODELS TO WEAKEN INDEPENDENCE 



11 



component. Let this number be d. If d = 1 the result is trivial. Now, assume 
that the result holds for d. If the connected component involves d+1 minors, 
let ri, . . . , rt be the indicator functions of the MCRs and c±, . . . ,c s be the 
indicator functions of the MCCs. Also assume that c\ and r\ involve the 
lex-smallest cell. Notice that c\ and r\ are the only vectors involving this 
cell. Given a linear combination 

(3.1) ^2kci = ^2fJ,m 

we must have Ai = Then the linear combination (3.1) can be read in 
B' = B \ {fn}, where m is the minor involving the lex-smallest cell and the 
q's and the r^s represent the log- vectors of the MCRs and the MCCs of 
B' . By the inductive hypothesis we get Aj = in = 1 for all i. □ 

As distinct connected components and free cells act on spaces which are 
orthogonal to each other, Proposition 3.11 leads to the following corollary. 

Theorem 3.12. Consider a B-weakened independence model defined by 
a set of binomials B whose graph has k connected components and with 
r MCRs, c MCCs and f free cells. The dimension of the vector space 
generated by MCRs, MCCs and the indicator functions of the free cells is 
(r + c + f-k). 

Proof. The indicators of the free cells are clearly independent with the 
indicators of the MCCs and of the MRCs. Moreover, indicators of MCCs 
and MCRs of different connected components are linearly independent as 
they do not share any cell. By Proposition 3.11 each connected component 
gives exactly one relation among the indicators of the MCRs and the MCCs. 
Hence the result follows. □ 

In the results above, we have addressed dimensional issues. Now, we use 
them to find a procedure to determine a sufficient statistic. Moreover, the 
examples of this section show that in some cases the vectors of MCCs, 
MCRs and free cells are sufficient to generate the space orthogonal to Zg. 
Clearly, these vectors are not sufficient when the graph Gq of the binomials 
in B present a hole, i.e., when we remove some minors with 4 double edges 
from the complete set of binomials C. Removing such a minor adds a new 
vector to the orthogonal. On the other hand, it does not add anything in 
terms of MCRs, MCCs and free cells. 

Thus, the last part of this section is devoted to actually find a sufficient 
statistic for a generic weakened independence model. The key idea is to start 
from the complete set of adjacent minors C and to remove minors iteratively. 
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This approach is motivated by the fact that for the complete set C a sufficient 
statistic is known to be formed by the row sums and the column sums, as 
extensively discussed in Section 2. 

We begin our analysis from a simple case. Namely, we consider a set of 
binomials B with given sufficient statistic Aq and we investigate the behavior 
of the sufficient statistic when we remove one minor m from B, i.e., when 
the set of binomials is B' = B \ {m}. We separate two cases, depending on 
the number of double edges of the removed minor. 

Lemma 3.13. Consider a weakened independence model obtained remov- 
ing a binomial with four double edges by a given family of adjacent binomials 
B, i.e. let B' = B \ {m} where m has four double edges. If we let Aq be the 
orthogonal to Zq, then the orthogonal to Z& is generated by the elements 
of Aq and by the indicator vector Q of a quadrant centered on one of the 
indeterminates of the removed minor. 

Proof. First notice that the elements of Aq are orthogonal to the columns 
of Zqi, i.e. A is' ^ A is, and clearly, by Lemma 3.2, one has 

dim(A B i) = dim(A B ) + 1 , 

where Aqi is the orthogonal to Zqi. Now let Q be the indicator vector of a 
quadrant centered on one of the indeterminates of m. Then, Q Aq as it is 
not orthogonal to the log- vector of m. But, Q € Aqi as each binomial in B' 
either avoid the quadrant, or it is contained in the quadrant, or has exactly 
two elements on the border of the quadrant. This is enough to complete the 
proof. □ 

The quadrant to be used in Example 3.10 is sketched Figure 6. 

Lemma 3.14. Consider a weakened independence model obtained remov- 
ing a binomial with not all the edges double by a given family of adjacent 
binomials B, i. e. let B' = B \ {m} where m has not all the edges double. If 
we let Ag be the orthogonal to Zq, then the orthogonal to Z& is generated 
by: the elements of Ag, the indicator vectors of the MCCs, of the MCRs 
and of the free cells. 

Proof. Clearly, the elements of As are orthogonal to the column of Z B >, 
i.e. Aqi D Aq, and by Lemma 3.2 one has 

dim(^ B /) = &im{A B ) + 1 , 

where Aq> is the orthogonal to Z&. The removed binomial m, with not all 
the edges double, can be one of the following 
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Fig 6. Binomials of Example 3.10. The dashed line delimits the quadrant defined in Lemma 
3.13. 





and to complete the proof we only need to present, in each case, a vector Q 
in Aqi which is not in In case (a), either we have a new free cell or not. 
If we have, let Q be the indicator vector of the free cell. Clearly, Q (jL Ajg, 
but Q € Aq> has no minor is involving the variable corresponding to the 
free cell. If we do not have a new free cell, then we have a new MMC or a 
new MCR and its indicator vector is the required one. Repeating this kind 
of argument in cases (b) through (e) we complete the proof. □ 

We are now ready to analyze the general case. 

Definition 3.15. Let B be a set of adjacent minors and consider its 
complement B in the set of all adjacent minors. Let Gq be the graph asso- 
ciated with B. For each connected component of Gq not touching the border 
of the table, we consider the lex-smallest variable and we call it a corner. 

Remark 3.16. We notice that the number of corners is just the number 
of holes one can find in the graph defined by the set of monomials. 
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Theorem 3.17. Let B be a set of adjacent minors. Then the orthogonal 
to Zg is generated by the indicator vectors of the MCCs, of the MCRs, of 
the free cells and by quadrants centered in variables corresponding to corners. 

Proof. B can be constructed by the set of all the adjacent minors by 
removing a minor at each time. Using Lemmas 3.13 and 3.14 we only need to 
show the result for the set of all the adjacent minors, but this is a straight- 
forward consequence of Lemma 3.2 and Theorem 3.12. □ 

Remark 3.18. In alternative to the straightforward use of Theorem 3.17 
one can apply Lemmas 3.14 and 3.13 to determine a sufficient statistic for a 
weakened independence model with set of binomial B. It is enough to start 
from the complete set of adjacent minors and remove one by one the minors 
not in B. Notice that such an iterative procedure, and the theorem itself, 
yield a system of generators of the space orthogonal to Zg, but not a basis, 
i.e. some of the vectors we add are redundant. 

We will show some examples and applications of Theorem 3.17 in the next 
sections. 

4. Exponential models. In Section 3 we have carried out some com- 
putations to determine a sufficient statistic of a weakened independence 
model. We are now able to find a parametric representation of the model. 

Let us introduce unrestricted positive parameters £i , . . . , (, s , where s is the 
number of columns of the matrix If A& has full rank, then s coincide 
with the dimension of the vector sub-space orthogonal to 

The first step is to prove that weakened independence models belong 
to the class of toric models and therefore they are exponential (log-linear) 
models on the strictly positive simplex. The first result in this direction is a 
rewriting of a general theorem to be found in [20]. 

Remark 4.1. The main result in [20] can be applied to statistical models 
when the matrix representation Aq of the sufficient statistic has non-negative 
entries. Therefore, the theory developed in Section 3 is relevant not only to 
actually determine a sufficient statistic, but also to derive further theoretical 
properties of the weakened independence models. 

Here we use again a vector notation in order to improve the readability 
and simplify the formulae. 
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Theorem 4.2 (Geiger, Meek, Sturmfels (2006), Th. 3.2). Given a B- 
weakened independence model, it can be expressed as 

(4.1) p(C) = ( Ab 

apart from the normalizing constant. 

Clearly, the parametrization in Eq. (4.1) is not unique. Theorem 4.2 pro- 
vides an easy way to switch from the implicit representation to its parametriza- 
tion. It is enough to consider the matrix Ag, whose columns are orthogonal 
to the log-vectors of the binomials. As the columns of Aq can be chosen 
with non-negative entries, then each binomial in B vanishes in all points 
of the form given in Eq. (4.1). This follows from a direct substitution. The 
converse part is less intuitive and the proof is not obvious. 

As a corollary, Theorem 4.2 allows us to consider weakened independence 
models into the larger class of toric models, as described in [28] and [31]. In 
the following result, we summarize the main properties inherited from toric 
models. 

Proposition 4.3. Consider a B-weakened independence model 

1. Vq is a toric model; 

2. With the constraint p > 0, Vis is an exponential model; 

3. In case of sampling, the sufficient statistic for the sample of size n is 
the sum of the sufficient statistic of all components of the sample. 

Proof. See Theorem 2 and the discussion in Section 3 of [31]. □ 

Remark 4.4. As noticed in [31], when we consider the general case p > 
instead of p > 0, the toric model is not an exponential model. Nevertheless 
it can be described as the disjoint union of a suitable number of exponential 
models. 

We conclude this section with two examples. 

Example 4.5. As a first example, we consider a statistical model for 
3x3 contingency tables defined through the binomials in Figure 7. Using 
the theory developed in the previous section, the MCRs, the MCCs and 
the free cells are sufficient to describe the orthogonal Zg and therefore the 
relevant matrices are in Table 1. Thus, a parametrization with parameters 
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Fig 7. Binomials for Example 4.5. 
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Table 1 

Matrices Zb and Ab for Example 4-- 5. 



Ci,. • • ,(s is: 




P3,l =C8 
P3,2 =C3C5 
P3,3 =(3(6 



Example 4.6. Let us consider the weakened independence model for 
4x4 denned in Example 3.10. In that model, a minor with four double edges 
has been removed and consequently the MCRs, MCCs and free cells are not 
sufficient to describe the orthogonal Zg. A vector must be added according 
to Theorem 3.13. Following the same approach as in the previous example 
one can easily write down the matrices Zq and Aq. A parametrization with 
parameters (\, . . . , £g is: 





=ClC5 


' P2,l 


=C2C5 


' P3,l 


=CsC 5 
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5. Inference and examples. In the previous sections we have defined 
and studied the weakened independence models. When the statistical model 
is given through a set of adjacent minors B, we are now able to compute a 
sufficient statistic for a sample of size 1 (see Proposition 4.3) and to find a 
parametrization of the statistical model Vjg (see Theorem 4.2.) In this section 
we give some ideas on how to compute maximum likelihood estimates (MLE) 
and to perform exact inference through Algebraic Statistics. 

In the independence model defined by the set C of all adjacent minors, the 
maximum likelihood estimate can be expressed in closed form in terms of the 
observed value of the sufficient statistic T. In the weakened independence 
models this is no longer true, but numerical algorithms for log-linear models 
can be used. As pointed out in Section 4, at least in the strictly positive case, 
a weakened independence model is log-linear and thus modified Newton- 
Rap hson methods, Iterative Proportional Fitting or EM methods can be 
used, see e.g. [3]. To compute the MLEs of of the cell probabilities for the 
examples in this paper, we have used the R software, see [29] , together 
with the package glim (generalized log- linear models), see [14]. This package 
allows to define a generic matrix for the sufficient statistic. This is the main 
advantage of the glim package with respect to other available procedures in 
different software systems. 

Another theoretical result which highlights once again the interplay be- 
tween statistical models and polynomial algebra is the Birch's Theorem (see 
e.g. [6]). It states that the MLE is the unique point p of the model Vq which 
satisfies the constraints Agp = A^pobs , where p a b s are the observed frequen- 
cies. We will have the opportunity to apply such result later in Example 



Once the MLE is available, the goodness-of-fit can be evaluated through 
a chi-squared test. The Pearson test statistic 



are evaluated and compared with the chi-square distribution with #£> de- 
grees of freedom, where j^B is the cardinality of B, see [22] or [6]. 

Alternatively, one can run the goodness-of-fit test within Algebraic Statis- 
tics, using a Markov Chains Monte Carlo (MCMC) algorithm. A number of 



5.3. 



(5.1) 



{Pi,j-Pi,j) 2 
hi 



or the log-likelihood ratio test statistic 



(5.2) 
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papers have shown the relevance of this approach, see [13], and e.g. [30], [4], 
and [8]. The algebraic MCMC algorithm was first described in [13], and it is 
by now widely used to compute non-asymptotic p- values for goodness-of-fit 
tests in contingency tables problems. 

Let h be the observed contingency table for a sample of size n, written as 
a vector in N IJ . The MCMC algorithm is useful to efficiently sample from 
the reference set Tt of a contingency table h given a sufficient statistic with 
matrix representation Aq, i.e., from the set 



The algorithm samples tables from Tt with the appropriate hypergeometric 
distribution 7i through a Markov chain based on a suitable set of moves 
making the chain connected. Such set of moves is called a Markov basis. 
With more details, a Markov basis is a set of tables {mi,...,mi} with 
integer entries such that: 

• Aym^ = for all k = 1, . . . , L 

• if hi and \i2 are tables in Tt-, there exist moves , • • • , vri^ A and signs 
ei,...,€A (e a = ±1) such that 



These conditions ensure the irreducibility of the Markov chain defined by 
the algorithm below: 

• Start from a table h\ & Tt; 

• Choose a move uniformly in {mi, . . . , mx,} and a sign e uniformly 
in {—1, 1}. Define h,2 = h\ + eiJik] 

• Choose u uniformly distributed in the interval [0,1]. If h.2 > and 
min{l, H(h,2)/H(hi)} > u then move from h\ to /i2, otherwise stay at 



In the general case, the computation of a Markov basis needs symbolic 
computations (the Diaconis-Sturmfels algorithm). Nevertheless, a Markov 
basis for the weakened independence models can be derived theoretically. In 
the following, we will determine a Markov basis for the weakened indepen- 
dence models. 

Given the set B of binomials defining the B- weakened independence model, 
let Aq be the matrix representation of the sufficient statistic. Moreover, we 
denote by Ig the polynomial ideal in R[p] generated by the binomials in B. 
Diaconis and Sturmfels ([13], Theorem 3.1) proved that a Markov basis is 





a=l a=l 
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Second instructor First instructor 





1 


2 


3 


Total 


1 


7 


5 





12 




(6.52) 


(5.48) 


(0) 




2 


4 


5 


2 


11 


3 


(4.48) 
1 


(3.76) 
5 


(2.76) 
5 


11 




(1) 


(5.76) 


(4.24) 




Total 


12 


15 


7 


34 



Table 2 

Evaluation of 34 homeworks by two instructors. In parentheses are the the MLE 
estimates for the weakened independence model. 

formed by the log- vectors of a set of generators of the toric ideal associated 
to Aq, i.e. the ideal 

Jb = {p a -p b \a,b£ R IJ , A B {a) = A l B {b)}. 

Therefore, the computation of a Markov basis translates into the compu- 
tation of a set of generators of a toric ideal. 

Bigatti et al. [5] showed that the toric ideal associated to A& is the sat- 
uration of Xg with respect to the product of the indeterminates. Such ideal 
is defined as: 

T-B ■ (pi,i • • -Plj) 00 = {/ S R[p] | (pi,i • • -pi,j) n f G 1b for some n} . 

In order to compute a set of generators of this ideal, one can use symbolic 
algebra packages, e.g. the function Toric of CoCoA. For further details on 
ideals and their operations, see e.g. [10] and [25]. 

Example 5.1. The data we present as a first example in this section 
have been collected by one of the authors in his Biostatistics course. Each of 
the 34 students must submit a homework before the exam and this report is 
evaluated by two instructors on a scale with levels {1, 2, 3}. The final grade 
is the maximum of the two evaluations. The data are in the Table 2. 

The model we use to analyze such data is the model defined by the adja- 
cent minors in Example 4.5. Using the glim package, we obtain the MLEs 
written in parentheses in the table. The Pearson statistic is 0.9863. Running 
a MCMC algorithm with a Markov basis consisting of 2 moves, we find a 
p-value of 0.6665 for the goodness-of-fit test, showing a good fit. The Monte 
Carlo computations are based on a sample of 10, 000 tables, with a burn-in 
phase of 50, 000 tables and sampling every 50 steps. 
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Smoking level HDLC 
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Total 
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15 
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25 
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2 


21 
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11 
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15 


3 


35 
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11 
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22 


Total 


39 


14 


39 


11 


103 



Table 3 

Cross-classification of Smoking level and HDLC. Smoking levels: 1 = "No Smoking", 
2 = "Less than 5 cigarettes" , 3 = "Less than 10 cigarettes" , 4 = "More than 10 cigarettes" . 
HDLC levels: 1 = "Normal", 2 = "Low Normal", 3 = "Borderline", 4 = "Abnormal" . 

Models of this kind are used in [7] to detect category indistinguishability 
both in intra-rater and in inter-rater agreement problems. The model we 
used shows that categories 1 and 2 are confused, as well as categories 2 
and 3. This lack of distinguishability can be ascribed to a relevant non- 
homogeneity of the marginal distributions. 

Example 5.2. The data in Table 3 show the cross-classification of 103 
subjects with respect to 2 ordinal variables: the smoking level, 4 categories 
from "No Smoking" to "More than 10 cigarettes" , and the quantity of High- 
Density Lipoprotein Cholesterol (HDLP) in the blood, 4 categories from 
"Normal" to "Abnormal". The data are presented in [24] and analyzed by 
the authors under both the independence model and the RC (Row Column 
effects) model. The authors compute the exact p- values for the independence 
model (0.049) and for the RC model (0.657) using the log-likelihood ratio 
test statistic. 

We use a weakened independence model with binomials in Figure 8. Ac- 
cording to Theorem 3.17, a sufficient statistic is formed by 4 MCRs, 4 
MCCs and 3 free cells. Moreover, the relevant Markov basis has 14 binomi- 
als. 

With our model we find a p-value of 0.7205. Therefore, this weakened 
independence model fits better than the independence model and it is as 
good as the RC model. In particular, removing only three adjacent minors 
from the complete configuration with 9 minors, we obtain a model whose fit 
is dramatically improved. Moreover, the removed minors allow us to identify 
quickly the cells which cause the departure from independence. 

Example 5.3. To conclude this section, we show how the models de- 
fined in the present paper have some relationships with a recent problem, 
first stated by Bernd Sturmfels in 2005 and known as the "100 Swiss Francs 
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Fig 8. Weakened independence model for the cholesterol data. 

Problem", see [32]. Such problem is related to the modelling of DNA se- 
quence alignments. We briefly describe the probabilistic experiment. For a 
plain description, the reader can refer to [27], Example 1.15. 

A DNA sequence is a sequence of symbols in the alphabet {A, T, C, G}. A 
major problem in molecular biology is to compare two DNA sequences. In 
[32], the following observed sequences were considered: 

ATCACCAAACATTGGGATGCCTGTGCATTTGCAAGCGGCT 
ATGAGTCTTAAAACGCTGGCCATGTCCATCTTAGACAGCG 

leading to the observed table below: 

2 2 \ 
2 2 
4 2 
2 4 j 

The hypothesis of the author is that such two DNA sequences are generated 
through a (biased) coin and four tetrahedral dice D\ , D2 , -D3 , -D4 with the 
letters A, T, C, G on the facets. When the coin outcome is "Head", then 
the dice D\ and D2 are rolled. The outcome of D± is registered in the first 
sequence and the outcome of D2 in the second one. When the coin outcome 
is "Tail", then the dice -D3 and -D4 are rolled. 

Denote by q\ , . . . , q^ the probability vectors of the four dice and by a the 
probability of "Head" in the coin. Then, the probability distribution of the 
final outcome is 

(5.4) aq x q\ + (1 - a)q 3 q\ . 

Therefore, the construction of this experiment leads us to consider the 
statistical model of 4 x 4 matrices of probabilities whose rank is less than or 



(5.3) 



( 4 2 

2 4 

2 2 

V 2 2 
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Fig 9. Weakened independence models Mi (left) and M2 (right) for the 100 Swiss francs 
problem. 

equal to 2. The author conjectured that the maximum likelihood estimate 
of the probabilities under the model of matrices with rank at most 2 is: 

3 2 2 \ 
3 2 2 
2 3 3 
2 3 3 / 

Further analyses to be found in [16] show that the model is non-identifiable 
and that numerical methods are able to identify 3 global maxima and 4 local 
maxima for the likelihood function. Apart from the simultaneous permuta- 
tion of the row and column labels, the global maximum is reached for the 
matrix in Eq. (5.5) and the local maximum is obtained by the matrix: 





( 8/3 


8/3 


8/3 


2 ^ 


1 


8/3 


8/3 


8/3 
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40 


8/3 


8/3 


8/3 


2 




^ 2 


2 


2 


4 J 



Now, consider the following probability models for 4x4 contingency ta- 
bles: 

• the model M of the matrices with rank at most 2; 

• the weakened independence models M\ and M2 whose binomials are 
presented in Figure 9. 

One can easily check that both Mi and M2 are proper subsets of M . In 
fact, in Mi the first row is proportional to the second row and so are the 
third and the fourth, while in M2, the first three rows are proportional. 

Now it is easy to check by direct substitution that the matrix P g of global 
maximum in Eq. (5.5) belongs to Mi, while the matrix P/ of local maximum 



(5.5) 



1 

40 
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in Eq. (5.6) belongs to M 2 . Such matrices are the MLEs for the two models, 
respectively. 

Remark 5.4. This is not a proof that the matrix P g is the MLE for the 
model M. Nevertheless, it is interesting to notice that our models contain 
both the local and the global maxima. However, our model can not suffice to 
find the local extrema in M as M is not a toric model, e.g., it is not defined 
by binomials. 

6. Final remarks and future work. In this paper we used the bi- 
nomial representation of the independence model to introduce a new class 
of statistical models:these models are devised to weaken independence. We 
studied their sufficient statistic, we proved that they are log- linear models 
for strictly positive probabilities and we showed how to make inference on 
these models. Some numerical examples emphasized the importance and the 
wide applicability of our models. 

We have in mind different ways to generalize this work. First, we want to 
find a procedure to characterize the relevant Markov bases to be used in the 
Diaconis-Sturmfels algorithm. Then, we plan to consider models defined by 
non-adjacent 2x2 minors and try to analyze them with similar techniques. 
Moreover, we are interested in the study of higher dimensional minors, e.g., 
3x3 minors which appear in the definition of the models in Example 5.3. 
Finally, for large tables there will be many weakened independence models 
and model selection strategies must be studied. Further applications of this 
kind of models will be investigated, in particular in the field of computational 
biology. From a geometrical point of view, we would like to explore the 
structure of the varieties defined by some adjacent minors, as done in [23] 
when all adjacent minors are considered. 
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